Automatically Deciding if a Document was Scanned or Photographed

نویسندگان

  • Gabriel Pereira e Silva
  • Marcelo Thielo
  • Rafael Dueire Lins
  • Brenno Miro
  • Steven J. Simske
چکیده

Portable digital cameras are being used widely by students and professionals in different fields as a practical way to digitize documents. Tools such as PhotoDoc enable the batch processing of such documents, performing automatic border removal and perspective correction. A PhotoDoc processed document and a scanned one look very similar to the human eye if both are in true color. However, if one tries to automatically binarize a batch of documents digitized from portable cameras compared to scanners, they have different features. The knowledge of their source is fundamental for successful processing. This paper presents a classification strategy to distinguish between scanned and photographed documents. Over 16,000 documents were tested with a correct classification rate of over 99.96%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P.s.a. for Hiding Scanned Documents within Images

PSA stands for Pixel Scrambling Algorithm. This algorithm was published early this year [Potdar, Chang 2004]. This paper shows a practical application of PSA and how we can hide scanned document within an image. The scanned document could be either hand written or printed. The main idea behind this algorithm is that most scanned documents contain alot of redundant data, which doesn’t provide an...

متن کامل

A Tool for Arabic Documents Indexing and Retrieval From a Web Virtual Library

This paper presents a method for automatic indexing and retrieval of Arabic documents from a virtual library. This latter can be multilingual and encapsulates several documents written in different languages. All the documents are scanned in order to be stored in the library. The indexing method consists in using the document contents as indexes. They are firstly scanned and then submitted to a...

متن کامل

Analysis of nonstationary inventory systems

INFORMATION TO USERS This material was produced from a microfilm copy of the original document. While the most advanced technological means to photograph and reproduce this document have been used, the quality is heavily dependent upon the quality of the original submitted. The following explanation of techniques is provided to help you understand markings or patterns which may appear on this r...

متن کامل

An Automatic Closed - Loop Methodology forGenerating Character

Character groundtruth for real, scanned document images is extremely useful for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not possible because (i) accuracy in delineating groundtruth character bounding boxes is not hi...

متن کامل

An Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents

Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enoug...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. UCS

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2009